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Abstract 

Nuclear magnetic resonance offers an appealing prospect for imple- 
mentation of quantum computers, because of the long coherence times 
associated with nuclear spins, and extensive laboratory experience in ma- 
nipulating the spins with radio frequency pulses. Existing proposals, how- 
ever, suffer from a signal-to-noise ratio that decays exponentially in the 
number of qubits in the quantum computer. This places a severe limit on 
the size of the computations that can be performed by such a computer; 
estimates of that limit are well within the range in which a conventional 
computer taking exponentially more steps would still be practical. 

We give an NMR implementation in which the signal-to-noise ratio 
depends only on features of NMR technology, not the size of the computer. 
This provides a means for NMR computation techniques to scale to sizes 
at which the exponential speedup enables quantum computation to solve 
problems beyond the capabilities of classical computers. 

A sequence of results over the last decade [O, 21, have provided the 
first credible challenge to the widely accepted notion that all physically "reason- 
able" computer models are roughly computationally equivalent, i.e. a problem 
is tractable (e.g. solvable in polynomial time) in one model if and only if it 
is solvable in another. At issue is the ability of computers based on quantum 
physics to perform certain computations (such as factorization [p0[) exponen- 
tially faster than clasical computers. However, realizing quantum computation 
in the laboratory has proved to be a formidable challenge since it requires an 
isolation of the computer from the effects of environmentally induced decoher- 
ence, while being able to operate upon its state to perform elementary oper- 
ations. Nevertheless, several proposals to realize quantum computation in the 
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laboratory have been made, using a variety of systems such as cavity quantum 
electrodynamics |l3[ ^, ^2) , trapped ions Q , and most recently nuclear magnetic 
resonance |l5|, |9| . 

The last proposal is particularly interesting for a number of reasons. Nuclear 
spins exhibit long relaxation times — with coherence times as long as thousands 
of seconds H. Moreover, NMR laboratory techniques routinely manipulate 
nuclear spins with sequences of hundreds of radio frequency pulses, and therefore 
provide a very attractive setting for carrying out a sequence of computational 
steps. However, for NMR techniques to be useful in quantum computation, 
there is a major obstacle that has to be overcome |l7], [l^] — initializing the 
system in (or near) a known initial state (say |0™)). By contrast, conventional 
NMR systems use macroscopic samples, which at room temperature and in 
thermal equilibrium must be regarded as constituting a statistical mixture of 
pure states. Of course, if single nuclear spins could be individually addressed, 
this state preparation problem could be solved |2S|] . However, this appears to 
be quite difficult to realize. 

A major breakthrough in the use of NMR techniques in quantum compu- 



tation came about in 15, |9[], where schemes for performing small scale NMR 
quantum computation using bulk samples were first introduced. The main idea 
in pl|, is to embed a small dimensional 'virtual' pure state within the density 
matrix describing the bulk sample, by exploiting the structure present in ther- 
mal equilibrium. Solving the initial state preparation problem in this way paves 
the way for experimental realization of quantum computation using off-the-shelf 
equipment for conventional pulsed NMR. Indeed, this approach has been used 
in the laboratory to implement 2-qubit (quantum bit) prototypes of a quan- 
tum computer: over 100 consecutive logic steps were performed on a 2-qubit 
computer, and the basic steps of Grover's search algorithm |l6|, ||, |j. Although 
this approach provides a very important "proof of concept" demonstration for 
quantum computation, it does not scale — the strength of signal output by the 
NMR quantum computer degrades exponentially in the number of quantum bits 
n in the system. Thus the exponential speedup promised by quantum compu- 
tation is offset by an exponential increase in the effort required to detect the 
output signal. The most optimistic predictions are that the output signal will 
be undetectable for computers on about 30 qubits. Quantum computations of 
this size could be quite efficiently simulated on conventional computers. 

In this paper, we give a new technique for preparing the initial state of 
the NMR system, where the output signal strength does not degrade as the 
number of qubits in the system is increased. We believe this is therefore the 
first proposal for a quantum computer which has long decoherence time, scales 
to large numbers of qubits, and does not suffer a corresponding decay in signal 
strength. 

NMR technology requires a "bulk" sample in order to create a readable 
signal in the output coils. A quantum computer will need to use enough macro- 
molecules in order to create this signal; this is a matter for experimental consid- 
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orations, which we will not discuss here. However, we avoid increases in sample 
size related to the complexity of the computation; this has the desirable aspect 
that a relatively small sample, which may be subjected to extreme conditions 
(of cold, magnetic field, etc.), may offer further opportunities for increasing the 
efficiency of the process. 

Polarization Process 

Before describing our proposal, we briefly discuss the physical setting. The 
outline of the implementation is founded in standard liquid NMR physics, al- 
though as the calculations will indicate, development of a useful NMR quantum 
computer will require more. 

We consider a collection of macromolecules, each containing n atoms with 
nuclear spin 1/2 and nuclear magnetic moment [A, suspended in a liquid medium 
at temperature T, so that the relaxation (coherence) time between the particles 
and the surrounding liquid is on the order of seconds or thousands of seconds. 

The liquid is subjected to a magnetic field B . Upon reaching thermal equi- 
librium, the difference between the fraction of particles oriented in the direction 
of the field, and those oriented in the opposite direction, is 

_ fiB 



where k is Boltzman's constant, approximately 1CP 16 in CGS units. 

A typical magnetic field B is approximately 10 5 Gauss. A nuclear magnetic 
moment such as that of the proton is approximately 10~ 23 in CGS units. At 
room temperature (T = 300 K), with an especially strong magnet, we can 
therefore obtain e 3 x 10~ 5 . 

As will be explained later, the number of qubits upon which a quantum com- 
putation can be performed, is approximately e 2 n, where n is the number of spin 
1/2 particles in the macromolecule we employ as our quantum computer. For e 
in the range obtained above, and in order to carry out a quantum computation 
on a useful number of qubits (e.g. 10 2 ), this would require an unpractically large 
macromolecule of size about 10 11 . 

Hence it is imperative to create a stronger initial polarization e. An obvious 
parameter to consider is temperature. Reducing the temperature to 10 _1 K 
gives e 10 _1 , therefore quantum computations on 10 2 qubits become possi- 
ble using a molecule of size about 10 4 . However, it is difficult to obtain long 
coherence times at these low temperatures. 

Perhaps a more promising avenue is the use of optical pumping techniques 
for boosting the value of e. Until recently this technique has been confined to 
atomic gases, particularly xenon [[j] |9); values of e exceeding 1/2 have been 
attained. There are plans at IBM to explore these techniques for molecules that 
may be suitable for quantum computation. With a value of e in this range, the 
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size of the molecules needed for a quantum computation on 10 2 qubits would 
be under 10 3 . 

In the remainder of the paper we will simply assume that some e has been 
provided by the polarizing process, and from that starting point we will show 
how to initialize the computer so that it can carry out any desired computation. 

Abstract Setting 

We start by describing an abstract computational model that describes an NMR 
quantum computer. An "NMR quantum computer" is described by four param- 
eters: n, i, k, and e. n is the number of qubits in the computer (it is the number 
of spins available for computation in each molecule of the NMR sample). 

Initially the n qubits are in a thermal mixture which deviates slightly from 
a uniform distribution, e is the bias induced, at the start, by the external 
polarizing process. Namely, if any given bit of the computer is measured, the 
probability that |0) is observed is 

We assume that the statistical correlation between any two bits on a molecule, 
falls off exponentially with the distance between those bits. I is the "correlation 
distance", the distance such that the correlation falls below some prescribed 
threshold such as 1/10. We will use the term e-biased distribution to refer to 
such a thermal mixture. 

If there were no correlations, the distribution on the bits would be binomial; 
in the more realistic case which we consider, we will be able to obtain all the 
same essential results as if the distribution was binomial. Only the analysis will 
be a little more difficult, and the numbers a little worse, than for the binomial 
distribution. 

Why is it sufficient to specify the distribution that results when we measure 
the n qubits in the computational basis? To properly describe the bulk sample 
in thermal equilibrium, we would have to specify the density matrix associated 
with the bulk sample. Different mixtures of pure states with the same density 
matrix are indistinguishable by any measurement (so long as that measurement 
is applied to the whole ensemble, not to individual members of the ensemble), 
and therefore by any quantum computation followed by a measurement in the 
computational basis. However, we will further restrict the quantum computation 
that we will allow during the state initialization process. The state initialization 
will be carried out by a computation that can only permute the computational 
basis states (i.e. by essentially a classical computation). Under these restrictions, 
it is sufficient to specify only the probability distribution that results when we 
measure the initial state of the sample in the computation basis. This is because 
different mixtures of pure states with different density matrices, but with the 
same resulting probability distribution, yield the same result under a basis state 
permutation followed by a measurement in the basis state. Since at the end of 
our initialization process, we plan to obtain 0(n) qubits in the all |0) state, any 
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further (general) quantum computation that is restricted to these qubits yields 
the same results that it would if started on a |0) state. 

In addition to the operation of initializing the thermal mixture to an e-biased 
distribution, there are four primitive computational operations that an NMR 
quantum computer supports: 

a) Cyclically shift the n bits clockwise or counterclockwise one position. 

b) Apply an arbitrary two bit operation to the first two bits. 

c) Measure the first bit (in some fixed basis). 

d) (For a quantum cellular automaton) For some fixed value of k (depending 
upon the structure of molecule chosen for the NMR experiment), apply an 
arbitrary 2-bit operation to all pairs of bits with indices Ik and lk + 1. 

Notes: 1. Operation (a) does not require that the macromolecule have a 
cyclic topology. Our operative assumption is a linear topology. The imple- 
mentation of the cyclic shift operation is given in the "Architecture" section, 
below. 2. As stated at the outset, these operations are a model of an NMR 
quantum computer. It must be understood that there is considerable flexibility 
in the design of the model, and that for the sake of specificity, we have made 
some arbitrary choices; proper choices must eventually be made on the basis of 
experimental considerations. 

In fact, there can be substantial reward for enriching the above operations. 
The machine architecture given by operations (a)-(c) corresponds to that of a 
1-tape Turing Machine. (We will speak of the site where we can execute arbitary 
operations on the pair of bits, as the "tape head".) Later in the paper, after 
describing designs which yield operations (a)-(c), we will also briefly describe 
how a slight variation of the design can in fact yield the equivalent of a 2-tape 
Turing Machine. (Still on a linear molecule.) With such a machine, the run 
time of our algorithm can be significantly improved. 

Overall Scheme 

An ideal NMR quantum computer would have its n qubit register initialized to 
|0 n ) . The main goal of this paper is to describe an efficient simulation of an ideal 
NMR quantum computer using an NMR quantum computer. Notice that if the 
bias e, in the initial state of the NMR quantum computer, were then the density 
matrix of the mixture (of the n qubit computers) would remain unchanged by 
any sequence of computational steps. Therefore an NMR quantum computer 
with parameter e = is incapable of supporting any computation. Our goal is 
to use the small but constant bias e > to isolate m = O(n) qubits such that 
the reduced density matrix of these m qubits is very close to the density matrix 
corresponding to the pure state |0 m ). 

What we need in order to achieve this goal is quite simple: we wish to 
carry out a permutation of the computation basis states x € {0, 1}™ such that 
states with low Hamming weight should be reencoded with a long prefix of 0's. 
A similar task has been addressed previously by a quantum computation Q. 
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However, in that method, the necessary permutations are accomplished with the 
aid of a quantum computer which already has at its disposal a clean workspace, 
i.e. a sequence of qubits in a known initial state (of size about n 1/>2 ). Obtaining 
such a clean workspace, in an NMR computer, is precisely the problem which 
needs to be addressed in order to make NMR quantum computing possible in 
the first place. 

In other words, what complicates the construction of these permutations, for 
us, is that we cannot assume that we have any clean bits at all (i.e. bits whose 
distribution is almost entirely supported on |0) or |1)) to store intermediate 
results of our computation, since all the available qubits are in the thermal state. 
Consequently, and because of the restricted set of primitive operations allowed 
on an NMR quantum computer (necessary because of the physical limitations), 
we are initially hampered in the kinds of logical operations we can implement 
in our computer. 

What we provide is an "end-to-end" procedure: we start with only a string 
of qubits in a thermal mixture, and we end with a string of qubits that with 
high probability are all in the |0) state. 

Theorem 1 Assume that the thermal mixture is in an e-biased distribution. 
Then there is a constant c such that, using primitives (a) and (b), we can 
convert the given mixture to one in which 1 — o(l) of the probability is supported 
on strings which begin with a run of ce 2 n O's. 

The process which we will describe uses 0(n 2 ) steps. 

We will show how to obtain a value of approximately 20 for c. A slightly 
more complicated implementation of our method (esp. by using blocks of size 
greater than 2 in phase 2, see below) can decrease this constant further. 
Proof 

We begin by permuting the bits; if we wish to minimize our reliance on any 
assumptions concerning the dependencies among spins in the original mixture, 
then the permutation of {1, n} is chosen at random, uniformly, by the exper- 
imenter. If (as is more likely, and as was assumed in the previous section) we 
can assume only local correlations then it is enough to "shuffle" the bits in any 
predetermined manner that guarantees that all bits that start out close to each 
other (within distance n 1 / 3 ) end up far apart (at least distance n 1 / 3 .) If we can 
really assume a binomial distribution on strings, then this initial permutation is 
unnecessary. Under weaker assumptions, the permutation is necessary in order 
for the probability bounds of the analysis to be valid. 

There are a variety of ways to carry out the permutation; using operations 
(a) and (b) it can be accomplished without difficulty using (to within a constant 
factor) the optimum number of transpositions. Typically, and in the worst case, 
this number will be on the order of n 2 . 

We will analyze weak (i.e. locally correlated) distributions as follows. The 
initialization algorithm has the property that it partitions the n bits into blocks 
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of size n 1 / 3 , and each processed bit output by the algorithm depends only on 
one of the blocks. Now, since the n bits were randomly permuted, with high 
probability no two bits in any block started out at distance less than n 1 / 3 . This 
implies (even under very weak assumptions on the manner in which local correla- 
tions decay) that the distribution on each block is very closely approximated by 
the binomial distribution. (Under the assumption that local correlations decay 
exponentially in distance, the distribution in the block will have exponentially 
small distance to the binomial distribution, in the Li norm.) 

After the initial permutation, we carry out the preparation of the initial 
segment of bits. This process will proceed in three phases. 

1. Boosting to constant bias: 

In this phase we extract, from n bits with bias e, 0(e 2 n) bits which have 
large constant (i.e. independent of n) bias. This process is efficient (in 
terms of how many bits of output are produced) up to a constant factor. 

2. Obtaining polynomially small S = (1 — e)/2 by increasing block sizes. 

3. Boosting to obtain a nearly perfect block of bits: 

In the final phase, while keeping the block size beneath n 1 / 2 , we reduce 
8 beneath n~ 10 . The union bound then implies that a computation can 
then begin, working on the assumption that all bits are O's, and incur 
only a polynomially small (n~ 9 ) probability of error due to possible bad 
initialization. 

Phase 1: Amplification to constant bias 

In phases 1-3 we partition the n bits into blocks of size n 1 / 3 . All computations of 
phases 1-3 are conducted internally within these blocks, until after phase 3 the 
clean bits are finally collected together in one location for use in a subsequent 
computation. In this way we ensure that we can use near-independence of the 
bits within each block. If the original probability distribution was binomial 
(rather than having local correlations), there is no need for this device. 

Theorem |l|, phase 1: Starting with n e-biased bits, and using operations 
(a),(b), we can with probability 1 — o(l) obtain f2(e 2 n) bits with bias at least 
0.856. 

We will go through several rounds of amplification; as soon as e exceeding 
0.856 is achieved, we stop using this process and switch to phase 2. 

The amplification scheme is very simple. Partition the bits into pairs. If 
the bits in a pair are different discard both. Else discard one. The expected 
bias towards among the surviving bits is ^r^y. Also, the expected number of 

bits that survive is n 1+ £ . Since the bits are nearly independent (they would 
be completely independent if the original distribution was binomial), a large 
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deviation bound now implies that with probability at least 1 — e - ™/ 3 , the number 
of bits surviving is at least \n — n 2 / 3 . 

As we go through several (k) rounds, the probability that we wind up with 
less than n4~ fe (l — n~ 1 / 3 ) k bits is at most fee - ™/ 3 . This is negligible. A little 
more complicated question is, can we wind up with bits with a constant (0.856) 
bias while bounding 4~ k from below by 0(e 2 )? A positive answer comes from 
the following analysis. 

From the formula e,+i = we obtain two things. First, 

i-1 
3=0 

So we can rephrase our goal: we wish to upper bound n}=o(l + e j) (where 
Ci = e). In an ideal process in which e doubled in each round, we would need 
k = lg(e/e ); in the true process we need to increase k over this ideal quantity 
by lgIlj=o(l + e |)- I n other words, the multiplicative effect on 4 fe (over the 

optimal factor), is at most (Ilj=o(l + e j)) 2 - 
Second, 




The remainder of this analysis is broken into two parts: the rounds until e > 
1/100, and the remaining rounds until e > 0.856. 
For the first part we use the inequality 

x < 0.02 implies \f\ — x > 1 — —x — -x 2 

2 4 

to show that 

1 , 1 2 . 
ei < 2 e '+H 1 + 2 €l+1 '' 

In particular note that this implies 

Ci < 0.5004e J+ i 

so long as e» is beneath our threshold for using this analysis. 

Now, I](l + e 2 ) < e^ £ ?. Consequently Y\(l + ej) < e ' 02 ' i-o. 1 5oo4 an d so the 
multiplicative effect on 4 fe in these rounds (the factor for how many bits we are 
losing) is bounded by e 02 i- - 5004 < 1.0017. 

In the remaining sequence of rounds we have 0.01 < e$ < 0.856. We obtain 
an upper bound on (n}=o(l + e |)) 2 hy explicitly calculating it beginning with 
the term corresponding to 0.856 and working down, until and including the first 
term that is less than 0.01 (which is the seventh iterate, equal to approximately 
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0.009985). This product is less than 6.7. 



Implementation: 

We have to be somewhat careful to implement the amplification scheme using 
the computational primitives described above. We can think of the machine 
given by primitives (a),(b), as a Turing machine, whose "tape head" is at the 
site at which arbitrary unitary operations can be implemented on a pair of 
adjacent bits. We will want to speak of the tape head carrying with it a small 
"register" of several bits: this is easily implemented, by interspersing rotations 
of the tape with transpositions at the site of the "tape head". We will use a 
two-bit register labelled 2/1,2/2- 

We will perform the amplification in stages. Start with arbitrary bits in 
the two-bit register. For m ranging from 1 up to N/2 (where N is the current 
number of bits left in the process — initially (^(n 1 / 3 )), carry out the two-bit 
operation "are they equal?", namely 1 01) — > 1 1 1) , 1 1 1) — > |01), on the pair of 
bits, which we will call x mj i,x TOj 2. 

Now for to ranging from 1 up to N/2, do the following. Exchange x m _\ with 
j/i, and x m ^ with y 2 . Now move the tape head back to the first pair, £1,1,2:1,2. 
For i from 1 to to — 1, do the following: if 2/1 = 0, exchange 2/2 with 2^2- Finally, 
move the tape head to pair to, and exchange x m ^ with y\, and x m ^ with y 2 . 

After m reaches N/2, and before the next iteration, exchange each pair of 
bits Xj,i,X2j,2 for 1 < j < N^-(l — o(l)). This brings all the "good" bits to 
the initial segment of length N^-(l — o(l)). This will be the value of TV in the 
next stage. (The 1 — o(l) term, derived from a law of large numbers, is chosen 
so that with high probability all bits in the segment arc in fact "good" bits.) 

The total number of steps in all stages of all rounds is quadratic in the block 
size, hence 0(n 2 / 3 ). 

At the end of the process, the 6(n 1 / 3 e 2 ) good bits lie in a segment at the 
start of the block. 

Why is it necessary to switch to phase 2 once the bias of the bits is high? 
Because once the bits have high bias, the bit that is discarded in a phase 1 
computation itself has substantial bias. Consequently the method is wasteful; 
if we continued with phase 1 to the end, the ratio of clean bits obtained to 
the number we started with, would tend to in n (rather than being the fixed 
quantity fl(e 2 ), independent of n). In phases 2 and 3 we use blocks that, instead 
of being of the fixed size 2, increase together with the bias. Only one or a 
constant number of bits are discarded from each block of the computation, and 
it becomes possible to discard a small fraction of the bits, while still amplifying 
those that remain. 
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Phase 2: obtaining polynomially small 5. 

Theorem [l], phase 2: Starting with n bits of bias at least 0.856, and using 
operations (a),(b), we can obtain Q(n) bits with 5 < n~ 3 . 

This phase will require O(loglogn) rounds, each using time 0(n 2 / 3 ). 

We begin with no bits, of which most are 0's, but a constant fraction, 6 , 
are l's. 

We partition the bits randomly into bins, each of fco bits X\, x^ a . In each 
bin, we parity bits x-i through Xk into bit x\. If x\ equals 1, we do not pass 
bits X2, ---Xko along to the next round; if x\ equals 0, we do. This is repeated for 
several rounds with varying fc. The bins are rerandomized in each round. (All 
the randomness, again, is provided externally by the experimenter. The com- 
putation itself is deterministic. In particular, all tape movements are oblivious.) 

Analysis: Let Sq = b^/no. The probability that a given bin contains exactly 
one 1 is (for large n /k n ) very close to 

fco<So(l-<So) fco_1 - 

(This is what it would be exactly, for independent sampling with probability 
So). 

Moreover for large no/ko, there is a law of large numbers saying the total 
number of bins containing one 1, call it u, is with high probability very close to 
its expected value, 

boil-So)*"- 1 . 

(a) The total number of bits passed along to the next round, rix, is lower 
bounded by only considering bits from blocks which were entirely 0's; this bound 
(again using a law of large numbers to make a high-probability statement) is 

ni > ^(1-S ) k "(k a -1). 
fc 

(b) The total number of l's passed along to the next round is at most b — u 
which, w.h.p., is close to its expectation, so we write 

bi < &o(l ~ (1 ~ ^o)^" 1 ). 

Now we need to make a good choice of k as a function of 8. Note that 8 = 
0.072 corresponds to e = 0.856. Our choice is as follows: for 0.0188 < 5 < 0.072, 
select k = 3. For 0.0027 < 5 < 0.0188, select k = 7. For 0.000158 < S < 0.0027, 
select k = 21. For 6 < 0.000158, select k = <5" - 4 . Note that in this region 
k > 33. 

In the first of these regions we are guaranteed ni/n® > 0.532; in the second 
we are guaranteed ni/n > 0.75; and in the third we are guaranteed ni/n > 
0.899. Each of these regions is encountered at most once in the process. 
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In the fourth region, we have m/no > ttKI — d) k > e 115 . We also have 

«i < ^V^-i* ^ i- 1 ^ 1 " (! " 5 ) fc_1 ) ^ L2<j2 ( fc - !) ^ 1-25 1 ' 6 - 

Consequently, over the entire fourth region, n( n i/ n o) — e _11 S' 5 > 0.96. 
The above iterations halt once we reach a large enough block size, n a for 

0. 2 < a < 0.32. At that point we implement another few iterations using 
blocks of size n 1 / 3 (we can simply use the entire block of bits that is allowed to 
interact), bringing 5 down close to the stationary point of the iteration 5i < S^k, 

1. e. S = n -1 / 3 ; let us say we halt when 5 < n~ ' 3 . 

Phase 3: obtaining 5 < n~ 10 . 

Theorem |l|, phase 3: Starting with n bits of bias at least 1 — n~°- 3 , and 
using operations (a),(b), we can obtain (1 — o(l))n bits of bias 1 — n~ 10 . 

This phase will require a constant number of rounds, each using time <3(n 2 / 3 ) 
in each n 1//3 -size block, hence 0(n 4 ^ 3 ) time overall. 

Fix blocks of size k — n 1 / 6 . Now instead of paritying into just one bit, parity 
into the first 2 bits, i.e. compute modulo 4 the number of l's in the block. We 
now implement the logic gate (x, y, z) — * (x, y, (x V y) z) with x, y and z 
representing the first three bits of the block. Now, if after this gate, the third 
bit is a 1, we pass the remaining n 1 / 6 — 3 bits of the block on to the next round. 
Now that the decision has been encoded in one bit (namely the third bit of the 
block), this procedure can be implemented in a manner similar to that described 
concerning phase 2 (the "decision bit" is carried in the tape head and controls 
whether or not a permutation is implemented). 

We will only pass l's through to the next round if there are at least 4 of them 
in the entire block, or any in the first 3 bits. The recurrence for 5 is therefore 
approximately 

<5i <S (3n- 1 / 6 + 3S + )<5 3 ) 

Beginning with the value 5 < rt~ ' 3 provided by the previous phase, only a 
constant number of iterations are required to reduce S beneath n -10 . The total 
number of bits is reduced only by a 1 — o(l) factor. 

Termination 

At this point, in time proportional to n 2 , we gather together the remaining 
bits from all the n 1 / 3 -size blocks, ready for a subsequent computation. The 
probability that any of these bits are not 0's is at most n~ 9 . 
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Efficiency: bit yield 



Collecting together the loss factors from phases 1, 2 and 3, we find that 

«• * 1111 

< 1.0017x6.7x— — x— - x— — x— — x (l + o(l))xe^ 2 < 20e" 



n final 



0.532 0.75 0.899 0.96 



This factor can be improved by using more complex computations. The chief 
place to obtain gains is in the latter stages of phase 1 and the earlier stages of 
phase 2; in both cases the way to improve efficiency is to use larger block sizes, 
and a more complicated permutation within each block, in order to extract a 
fraction of bits from the block that tends to the optimal fraction, (t^-) 2 - 

Q(e 2 n) clean bits is optimal 

It was noted above that if e = 0, we cannot prepare any bits at all that are 
biased toward |0). If e > 0, how many such bits can we hope to prepare? If we 
ask that with high probability k bits are all 0's, then the central limit theorem 
places a limit on k of n(l — #2(^5^)) which, for small e, is approximately ne 2 . 
To prepare just one good bit, therefore, we must use about e~ 2 bits with bias e. 



Architecture 

We now discuss how the computational primitives (a),(b), and some extensions, 
can be implemented on polymers with certain kinds of periodic structures. 

Turing machine: 

Normally one imagines a Turing machine having a "head" which implements 
computations locally, i.e. involving the state of the "tape" in the vicinity of the 
head. We implement this abstraction (but without any moving parts) in the 
following way. (It must be understood that there is considerable flexibility in 
the design, and that for the sake of specificity, we arc making some arbitrary 
choices; the proper choices must eventually be made on the basis of experimental 
considerations.) 

The tape will not of course be infinite, but a ring of n qubits. These will be 
realized in the nuclear spins of a linear polymer. The polymer will consist of 
n/3 repetitions of the sequence ABC, thus ABC ABC ABC ABC . . . ; the atoms 
A,B,C have spin 1/2 nuclei. In addition, at one point in the chain, another 
atom, D, is adjacent to the chain, near a neighboring pair of C and A atoms; it 
induces a chemical shift in some of the energy levels at these two neighboring 
atoms. 

(Note: it is not actually necessary for A,B and C to be different types of 
nuclei; they could all be of one kind, if the periodic structure resides in adjacent 
atoms that induce suitable chemical shifts in the energy levels.) 
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Five resonant frequencies will be such that we can implement the following 
five operations: 

1. Frequency 1: transposition of the qubits in all adjacent AB pairs. 

2. Frequency 2: transposition of the qubits in all adjacent BC pairs. 

3. Frequency 3: transposition of the qubits in all adjacent CA pairs. 

4. Frequencies 4,5: these resonate only with energy levels shifted by the 
presence of atom D. Hence they induce a unitary operator only on the 
pair of qubits at the C and A atoms immediately adjacent to atom D. We 
assume that the combinations of frequencies 4 and 5 generate the group 
of all transformations in that 4-dimensional Hilbert space. 

Arbitrary "oblivious" quantum computations can be performed on this ma- 
chine. By an "oblivious" computation we mean one in which the sequence of 
movements of the tape head is a function is the same in all the superposed 
"copies" of the machine, in the quantum computation. 

A cyclic shift of the tape by one position is implemented by the following 
sequence of transpositions: (A, B), (C, A), and then (B,C). (Each such trans- 
position can be implemented by three CNOT gates: for example (A, B) can 
be implemented by the sequence [A — > B], [B — > A], [A — > B].) A succession 
of such triples of transpositions will bring any desired pair of adjoining qubits 
next to the tape head. 

Cellular automaton with distinguished site: 

Lloyd has proposed implementing a quantum cellular automaton. 

We propose an architecture similar to what we have described above, but 
now we use five kinds of atoms: three (A, B,C) have spin 1/2 nuclei and two 
(D, E) induce chemical shifts in resonant frequencies of nearby atoms of the 
first three types. We assume that k\n. The ring consists of repetitions of the 
pattern ABC] after every k atoms of type A,C, one atom of type D adjoins 
the chain and induces local chemical shifts. At one site an E atom adjoins the 
chain and induces chemical shifts, which are different from those induced by D. 

One step of the computation is implemented by a pulse at a frequency that 
involves a D atom and the two adjacent spin 1/2 atoms; rotations of the tape 
are implemented as above, small rotations allow information to be sent between 
adjacent "cells" of the cellular automaton, while global rotations bring the tape 
contents past the E site, where individual operations may be implemented. 

Two-tape Turing machine: 

To implement a two-tape Turing machine we need to enable the head to move 
independently on each of the tapes. Equivalently, in our implementation, we 
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need to have two cycles of bits, which can independently be cyclically shifted 
past the head. 

Let the molecule consist of n repetitions of the sequence ABCD. (As above, 
these are spin 1/2 nuclei and each adjacent type of pair can be addressed with 
distinctive frequencies.) 

The A and C nuclei will carry one tape, the B and D nuclei the other. (Note 
that the nuclei of any given type carry a contiguous segment of half a tape, not 
every other bit.) 

The sequence of transpositions (AB)(BC)(AB)(CD)(AD)(CD) rotates the 
AC-tape by one position, while leaving the BD-tape fixed. 

The most time-consuming stages of our procedure are the initial permutation 
of the bits and the final collecting of the clean bits, each requiring time 0(n 2 ). 
In fact, these are the only stages which require more than time 0(n 4 / 3 ). 

The terminal permutation is very simple; the initial permutation can be 
very simple, as well, so long as we make the "local correlations" assumption 
on our initial e-biased distribution, in which case we can use the permutation 
which sends bit rn 1 / 3 + s (for < s < n 1 / 3 ) to position (r + s)™ 1 / 3 + s. In 
this case, the initial permutation can be performed in time 0(n 4 / 3 ), and the 
final permutation in linear time, on the 2-tape architecture. Consequently, the 
entire procedure can be implemented in time 0(n 4 / 3 ). If we further augment 
our device by combining the features of a 2-tape machine with those of a cellular 
automaton, with k — n 1 / 3 , then the initial permutation can be performed in 
linear time, and in phase 3 and the latter part of phase 2 we can gain time by 
working in parallel within each n 1 / 3 -size block. The overall runtime reduces to 
linear. 

Thus there is substantial benefit in implementing slightly stronger primitives 
than the minimal list of operations (a)-(c). 
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